Picture for Daxin Tan

Daxin Tan

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Add code
Jun 01, 2026
Viaarxiv icon

A Survey of Audio Reasoning in Multimodal Foundation Models

Add code
May 20, 2026
Viaarxiv icon

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

Add code
May 07, 2026
Viaarxiv icon

Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models

Add code
Mar 10, 2026
Viaarxiv icon

PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs

Add code
Jan 23, 2026
Viaarxiv icon

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

Add code
Jan 15, 2026
Viaarxiv icon

AEQ-Bench: Measuring Empathy of Omni-Modal Large Models

Add code
Jan 15, 2026
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data

Add code
Sep 17, 2024
Figure 1 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 2 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 3 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Figure 4 for Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Viaarxiv icon

Exploring SSL Discrete Tokens for Multilingual ASR

Add code
Sep 13, 2024
Figure 1 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 2 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 3 for Exploring SSL Discrete Tokens for Multilingual ASR
Figure 4 for Exploring SSL Discrete Tokens for Multilingual ASR
Viaarxiv icon